Collaborative Dialogue for Controlling Autonomous Systems
نویسندگان
چکیده
We claim that a natural dialogue interface to a semiautonomous intelligent agent has important advantages, especially when operating in real-time complex dynamic environments involving multiple concurrent tasks and activities. We discuss some of the requirements of such a dialogue interface, and describe some of the features of a working system built at CSLI, focusing on the data-structures and techniques used to manage multiple interleaved threads of conversation about concurrent activities and their execution status. Dialogue Interfaces for Autonomous Systems We believe there is a strong case for natural dialogue interfaces for autonomous systems performing complex tasks in unpredictable dynamic environments. Some authors (e.g. (Schneiderman 2000)) have argued against speech or natural language interfaces for such systems, as opposed to a graphical interface. However, we agree with many of the points made by Allen et al (2001): in particular, they argue that for increasingly many of the more complex tasks, GUIs become infeasible due to the complexity of the device and the activities it is required to perform. It is important to clarify what we mean by “dialogue”: a dialogue interface is not simply a matter of adding a speech recognition and generation component to a device (although Allen et al make the point that even this can enhance an existing GUI). Dialogue is a truly collaborative process between two (or more) participants (in the case we’re interested in, a human operator and a robot or other agent) whereby references and tasks are negotiated and agreed on, often in an incremental manner. Following Allen et al, we claim that collaborative natural dialogue offers a powerful medium for interaction between humans and intelligent devices. This is particularly the case for complex systems operating in dynamic, realtime environments, containing multiple concurrent activities and events which may succeed, fail, become cancelled or revised, or otherwise warrant discussion. Human dialogue Copyright c 2002, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. This work was supported in part by grant number N00014-021-0417 from the Department of the Navy for “High-Level Control of Human-UAV Teams”. in such scenarios are highly collaborative (Clark 1996), with much negotiation between the instructor and the taskperformer: dialogue is used to specify and clarify goals and tasks, to monitor progress, and to negotiate the joint solution of any problems. Further, an interface to a device operating in such conditions must be interruptible, context-dependent, and otherwise extremely amenable to multiple threads of conversation. Moreover, we argue that natural dialogue interfaces have important human-centered advantages over purely-GUI interfaces, including: The amount of required specialized training is reduced, allowing task experts (e.g. pilots, for UAVs) to be used as operators of the autonomous systems. Even though some adaption to the interface will generally be needed, much of the interface’s power comes from the natural mode of interaction; Natural language is an efficient medium for communication and interaction; human communicators are used to making use of previous dialogue context and referring to previously introduced items in a conversational context; The collaborative nature of dialogue allows the operator to give timely succinct instructions, further embellishing them if so requested by the agent; A speech interface allows hands-free operation; this is critical for controlling intelligent devices while, say, operating a vehicle, or provides an important separate modality of interaction for a robot or autonomous device being (semi-)controlled using some other mode (such as a joystick or remote control); The cognitive load on the human operator is lessened since there is less need to focus on the interface itself and its usage; this is critical when the operator-agent team is involved in complex tasks requiring high reactivity, and especially so as we move to situations where a single operator may control multiple agents. This is not to claim that a GUI interface is not extremely useful; indeed, we believe (as do Allen et al), that GUI and dialogue interfaces can be natural supplements to each other—in particular, a GUI can serve as a shared representation of the environment under discussion, allowing simple gestures to be incorporated into the dialogue process (e.g. pointing by the human, highlighting objects by the agent).2 Robust Natural Dialogue Interfaces Allen et al (2001) discuss some of the dialogue phenomena that must be handled as task and domain complexity increases. Of greatest interest and relevance to our focus are the following: Mixed-initiative interaction: the autonomous agent must be able to both respond to the human operator as well as initiate its own thread of conversation: agentinitiated conversation may arise from its perceptions of a changing environment, or execution-status of its tasks or activities; Collaborative negotiation sub-dialogues, to refine a command, request, or proffered information: e.g. a human-given command may be ambiguous or not fully specified, or the agent may note that a particular request may be impossible or sub-optimal to perform; Different epistemic/temporal modalities, e.g., for distinguishing between the current state of the world and a planned one: this allows the user and agent to discuss viability of future tasks or plans to achieve a goal; Context-dependent interpretation of all interactions: i.e. the preceding dialogue provides an important context for the most current utterance; in particular, noun-phrases are often resolved or disambiguated by referring to previous dialogue, an important aspect in the “efficiency” of natural language as a medium of interaction. To these important properties we add the following, which provide much of the focus for our discussion below: Ability to handle dialogues about multiple concurrent tasks in a coherent and natural manner: many conversations between humans have this property, and dialogues between humans and (semi-)autonomous agents will have this feature in as much as such agents are able to carry out activities concurrently; Generation of natural speech: i.e. what to say and when to say it: it is important not to overload the human operator with irrelevant information, and to present all information as succinctly as possible, particularly in highly dynamic environments. These two issues are intertwined: the task of generating natural responses that are understood by the human with minimal cognitive effort is complicated when multiple threads of discussion are available. In this scenario, the problem of establishing and maintaining appropriate context in a natural way becomes difficult. Generation is also complicated by information becoming available in real-time, from different sources, involving the risk of overloading the operator with irrelevant or highly repetitious utterances. In general, the system should appear as ‘natural’ as possible from the user’s point of view, including See (Suwa & Tversky 2002) for a discussion of the importance of the role of shared representations. using the same language as the user if possible (“echoing”), using anaphoric referring expressions where possible, and aggregating utterances where appropriate. Further, another desirable feature is that the system’s generated utterances should be in the coverage of the dialogue system’s speech recognizer, so that system-generated utterances effectively prime the user to speak in-grammar. The CSLI dialogue system addresses these issues using techniques that we describe below, including: dialogue context representation to support collaborative activities and concurrent tasking; modeling activities to support dialogues for task monitoring and collaborative planning; dialogue and conversation management to support free and natural communication over multiple conversation topics; natural generation of messages in multi-tasking collaborative dialogues. Sample Application and Dialogue The CSLI dialogue system has been applied to multiple applications. For the purposes of illustration here, we describe the WITAS3 UAV (‘unmanned aerial vehicle’) application— a small robotic helicopter with on-board planning and deliberative systems, and vision capabilities (for details see e.g. (Doherty et al. 2000)), although the current implementation interacts with a simulated UAV. In this application, mission goals are provided by a human operator, and an on-board planning system then responds. While the helicopter is airborne, an on-board active vision system interprets the scene or focus below to interpret ongoing events, which may be reported (via NL generation) to the operator. The robot can carry out various “activities” such as flying to a location, or following a vehicle, or landing. These activities are specified by the user during dialogue, or can be initiated by the UAV’s on-board AI. In any case, a major component of the dialogue, and a way of maintaining its coherence, is tracking the state of current or planned activities of the device. A more interesting and problematic notion is that of “joint-activities” between the UAV and a human operator. These are activities which the autonomous system cannot complete alone, but which require some human intervention. In our current scenarios, the UAV’s vision system is not good enough to determine whether a particular vehicle is the one sought-after, and only the human operator has the authority to determine this, so that human and robot must collaborate in order to find and track a vehicle. The dialogue in Figure 1 shows how a typical interaction works (other capabilities, such as clarification subdialogues, are covered in (Lemon et al. 2001)). Note here that the user is able to make explicit queries about the robot’s activities (both current and future), that there are concurrent activities, and that conversational initiative centers around the joint activities currently being specified and executed. See http://www.ida.liu.se/ext/witas Figure 1: A demonstration of the CSLI dialogue system Multi-modal Utterances Dialogue Move Operator (O): Our job is to look for a red car Command (Joint Activity) UAV (U): Ok. I am looking for one. Report (Confirm Activity) O: Fly here please. [+click on map] Command (Deictic) U: Okay. I will fly to waypoint one. Report (Confirm Activity) U: Now taking off and flying there. Report (Current Activity) O: Stop that. Go to the tower instead. Command, Revision U: I have cancelled flying to waypoint one. I will fly to the tower. Report (Activity status) O: What are you doing? Wh-question (Current Activity) U: I am searching for a red car and flying to the tower. Answer (Current Activity) O: What will you do next? Wh-question (Planned Activity) U: I have nothing planned. Answer(Planned Activity) U: I see a red car on main street. [display on map, show video images] Is this the right car? Report, Yn-question (Activity) O: Yes, that’s the right car. Yn-answer (Positive) U: Okay. I am following it. Report (Current activity) Note also the simple use of multi-modal interaction: in the first portion of the dialogue the user indicates the destination by clicking on a map in the accompanying GUI, and later the UAV uses the map and video to illustrate what it sees.
منابع مشابه
Effects of Audio-Visually Prompted Collaborative Dialogue on EFL Learners' Listening Comprehension Development
This study investigated the comparative effects of audio-visually prompted collaborative dialogue on the listening comprehension development of symmetrical, asymmetrical, and asymmetrical teacher-fronted EFL learner groups. Besides, it explored the attitude of the participants of the groups concerning the effectiveness of collaborative dialogue for their listening comprehension improvement. The...
متن کاملUsing an Activity Model to Address Issues in Task-Oriented Dialogue Interaction over Extended Periods
Collaborative natural dialogue offers a powerful medium for interaction between humans and intelligent autonomous agents (Allen et al. 2001). This is particularly the case for complex systems operating in dynamic, real-time environments, containing multiple concurrent activities and events which may succeed, fail, become cancelled or revised, or otherwise warrant discussion. Human dialogue in s...
متن کاملRobot, asker of questions
Collaborative control is a teleoperation system model based on human–robot dialogue. With this model, the robot asks questions to the human in order to obtain assistance with cognition and perception. This enables the human to function as a resource for the robot and help to compensate for limitations of autonomy. To understand how collaborative control influences human–robot interaction, we pe...
متن کاملSituated dialogue coordination for spoken dialogue systems
In this paper, we present a general framework and architecture for maintaining dialogue coordination in spoken dialogue systems, in which intended behaviors and goals are incrementally performed during the course of maintaining dialogue coordination. The dialogue structure emerges as a result from interaction between user and the dialogue system. The key feature of this design for the systems i...
متن کاملCollaboration, Dialogue, and Human-Robot Interaction
Teleoperation can be significantly improved if humans and robots work as partners. By adapting autonomy and human-robot interaction to the situation and the user, we can create systems which are easier to use and better performing. In this paper, we discuss the importance of collaboration and dialogue in human-robot systems. We then present a system based on collaborative control, a teleoperati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002